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Abstract 

This paper presents an R package EMMIXcskew for the fitting of the canonical funda¬ 
mental skew t-distribution (CFUST) and finite mixtures of this distribution (FM-CFUST) 
via maximum likelihood (ML). The CFUST distribution provides a flexible family of mod¬ 
els to handle non-normal data, with parameters for capturing skewness and heavy-tails in 
the data. It formally encompasses the normal, t, and skew-normal distributions as special 
and/or limiting cases. A few other versions of the skew t-distributions are also nested 
within the CFUST distribution. 

In this paper, an Expectation-Maximization (EM) algorithm is described for comput¬ 
ing the ML estimates of the parameters of the EM-CFUST model, and different strategies 
for initializing the algorithm are discussed and illustrated. The methodology is imple¬ 
mented in the EMMIXcskew package, and examples are presented using two real datasets. 

The EMMIXcskew package contains functions to fit the EM-CEUST model, including 
procedures for generating different initial values. Additional features include random 
sample generation and contour visualization in 2D and 3D. 

Keywords: mixture models, fundamental skew distributions, skew normal distribution, skew 
t-distribution, EM algorithm, R. 


1. Introduction 

Finite mixture models, in particular normal mixture models, have been widely used in statis¬ 
tics and a diverse range of applied helds such as bioinformatics, biomedicine, economics, 
finance, genetics, image analysis, psychometrics, and social science. They provide a power¬ 
ful and flexible tool for the probabilistic modelling of data, with applications ranging from 
density estimation to clustering, classification, and discriminant analysis. For a survey on mix¬ 
ture models and their applications, see Everitt and Hand (1981), Titterington et al. (1985), 
McLachlan and Basford (1988), Lindsay (1995), Bohning (2000), McLachlan and Peel (2000), 
and Friihwirth-Schnatter (2006), the edited volume of Mengersen et al. (2011), and also the 
papers by Banfield and Raftery (1993) and Fraley and Raftery (1998). 
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In recent years, mixture models with skew component distributions have received increasing 
attention. These models adopt densities that can take more flexible distributional shapes than 
the traditional normal and t-distributions as component distributions, rendering them suitable 
for a wider range of applications. Of these, the skew t-distribution is gaining popularity due 
to its ability to handle both the asymmetry and heavy-tailedness in the data. In particular, 
a number of different formulations of skew t-distribution have been used extensively in the 
model-based clustering literature (see, for example, Lee and McLachlan (2014a, 2016a), Azza- 
lini et al. (2016), McLachlan and Lee (2016) and the references therein). They have also found 
many applications in a range of fields, including astrophysics (Riggi and Ingrassia 2013), finan¬ 
cial risk analysis and modelling (Soltyk and Gupta 2011, Bernardi 2013, Lee and McLachlan 
2013b, Abanto-Valle et al. 2015), fisheries science (Contreras-Reyes and Arellano-Valle 2013), 
flow cytometry (Pyne et al. 2009, Friihwirth-Schnatter and Pyne 2010, Rossin et al. 2011, Ho 
et al. 2012, Hu et al. 2013, Pyne et al. 2014, Lee et al. 2014, Lin et al. 2016, 2015, Lee et al. 
2016c, Pyne et al. 2015), image segmentation (Lee and McLachlan 2013a), pharmaceutical 
science (Schaarschmidt et al. 2015), and the social sciences (Muthen and Asparouhov 2014, 
Asparouhov and Muthen 2016). For a comprehensive survey of skew distributions, see, for 
example, the articles by Azzalini (2005), Arellano-Valle and Azzalini (2006), Arellano-Valle 
et al. (2006), the book edited by Genton (2004), and the recent monograph by Azzalini and 
Capitanio (2014). 

Recently, Lee and McLachlan (2016a) introduced a finite mixture of canonical fundamental 
skew t (FM-CFUST) distribution. This formulation of the skew t-distribution has a gen¬ 
eral p X q matrix of skewness of parameters (Arellano-Valle and Genton 2005). It thus pro¬ 
vides a more general characterisation than the restricted and unrestricted skew t-distributions 
(adopting the terminology of Lee and McLachlan (2013c)). This paper describes an R pack¬ 
age EMMIXcskew for the fitting of the FM-CFUST model. It implements the EM algorithm 
described in Lee and McLachlan (2016a) and provides other functionalities such as random 
sample generation, density evaluation, and the plotting of contours in 2D and 3D. 

The remainder of this paper is organized as follows. Section 2 provides a brief description of 
the CFUST distribution and its nested models. Section 3 outlines an EM algorithm for fitting 
finite mixtures of CEUST distributions and examines different approaches for generating 
starting values for this EM algorithm. In the next two sections, the usage of the EMMIXcskew 
package is illustrated using real and simulated examples. Finally, we conclude with some brief 
remarks in Section 6. 


2. The CFUST and related distributions 


To establish notation, let Y be p-dimensional random vector that follows a multivariate 
CFUST distribution, denoted by V ~ CFUSTpq{fj,, Xl, A, u). Then the density of Y is given 
by 



( 1 ) 
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where 

n = S + AA^, 

c{y) = (y - m) , 

A = /g-A^fi-^A, 

d{y) = {y - fi)^ {y - fr) . 

It can observed from (1) above that the CFUST distribution is described by the parameters 
{fx, S, A, v), where /i is a p-dimensional vector of location parameters, S is a positive definite 
scale matrix, A is a p x g matrix of skewness parameters, and n is a scalar degrees of freedom 
that regulate the tails of the distribution. In the above, we let tp (y, fi, n) denote the p- 
dimensional t-distribution with location parameter fx, scale matrix fl, and degrees of freedom 
ix, and Tq{.) is the g-dimensional (cumulative) t-distribution function. 

As mentioned previously, the CFUST distribution includes some commonly used distributions 
as special and/or limiting cases. Taking A = 0 reduces (1) to the symmetric multivariate 
t-density tp{fx,Ll,ix), and further letting ix ^ oo and ix = 1 leads to the multivariate normal 
Np{fx, fi) and Cauchy Cp{fx, 11) distributions, respectively. If A is constrained to be a diagonal 
matrix, we obtain the skew t-distribution of Sahu et al. (2003) which is referred to as the 
unrestricted skew f-distribution using the terminology in Lee and McLachlan (2014a, 2013c). 
To obtain the classical skew t-distribution by Azzalini and Capitanio (2003) from (1), one can 
set q = 1 or take A to be a matrix of zeros except for one column (Lee and McLachlan 2016a). 
This formulation of the skew t-distribution, referred to as the restricted skew t-distribution, 
is equivalent to that given by Branco and Dey (2001), Gupta (2003), Lachos et al. (2010) 
and Pyne et al. (2009); see Lee and McLachlan (2013c). Analogously, the restricted and 
unrestricted skew normal distributions can be obtained by placing appropriate constraints 
on A and letting ix ^ oo. Some properties of the CFUST distribution are described in 
Arellano-Valle and Genton (2005). It is of interest to note that this distribution suffers an 
identifiability issue as discussed in Lee and McLachlan (2016). In brief, this means that the 
GFUST density is invariant under permutations of the columns of the skewness matrix A, 
but this does not affect parameter estimation. Hence, in practice, the user only needs to be 
aware that changing the order of the columns of A does not affect the density of the GFUST 
model. 

There are several R packages on CRAN that deal with (multivariate) mixture models with 
skewed component densities. In particular, the restricted and unrestricted versions of skew 
t-mixture models are implemented in EMMIXskew (Wang et al. 2009) and EMMIXuskew 
(Lee and McLachlan 2013d), respectively. The package mixsmsn (Prates et al. 2013) im¬ 
plements the family of finite mixtures of scale-mixture of skew-normal distributions, which 
includes a skew normal distribution and a skew t-distribution that is equivalent to the re¬ 
stricted skew normal distribution and restricted skew t-distribution, respectively. However, 
the estimation procedure used in mixsmsn imposes the condition that all components of the 
skew f-mixture model share a common value for the degrees of freedom. A recently developed 
package MixGHD (Tortora et al. 2015) provides functions to fit finite mixtures of generalized 
hyperbolic distributions. For the classical multivariate skew-normal and skew t-distributions, 
the sn package (Azzalini 2014) can be used. For traditional normal mixture model and related 
tools, a number of other packages are available on GRAN, such as bgmm (Biecek et al. 2012), 
fiexmix (Leisch 2004, Griin and Leisch 2008), mclust (Fraley and Raftery 2007, Scrucca et al. 
2016), and mixtools (Benaglia et al. 2009). 
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3. Fitting mixtures of CFUST distributions via the EM algorithm 

The density of a finite mixture model is given by a convex linear combination of component 
densities. More formally, adopting the CFUST distribution as component densities, we obtain 
a finite mixture of CFUST (FM-CFUST) distribution with density given by 

9 

h=l 


where Xh {h = are the mixing proportions and /(.) denotes the CFUST den¬ 
sity given by (1). The mixing proportions satisfy vt/j > 0 and Ylh=i'^h ~ vector 

T' = (tti, ..., TTg-i, 6i ,..., contains all the unknown parameters of the model, with Oh 
containing the elements of and Sh, the distinct elements of Eh, and Vh- 

For the fitting of the FM-CFUST model, we employ the EM algorithm (Dempster et al. 1977) 
to compute the maximum likelihood (ML) estimate of the parameters of the model. The EM 
algorithm proceeds by alternating repeatedly between the E- and M-steps until the changes 
in the log likelihood values are less than some specified small value in the case of convergence. 

To facilitate parameter estimation via the EM algorithm, a set of latent variables are in¬ 
troduced, namely the component labels Zj (corresponding to the j = 1 ,... ,n independent 
observations on V), alongside two random variables Uj and Wj that follow a half-normal 
distribution and a gamma distribution, respectively. Thus, the complete-data for the FM- 
CFUST model consist of these missing variables and the observations yj. This leads to a 
four-level hierarchical characterization of the FM-CFUST model, given by 

Y j\uj,Wj,Zhj = l ~ 

Uj\wj,Zhj = l ~ 

Wj I Zhj = 1 ~ 

Zj ~ 

where Zj is a gf-dimensional vector of binary component labels such that Zhj = {Zj)h = 1 
if the jth observation belongs to the hth component and zero otherwise. Here, F[Nq(0,E) 
denotes the ( 7 -dimensional half-normal distribution with scale matrix S, gamma((a, /?) denotes 
the gamma distribution with mean and Multg(l, 7 r) denotes the multinomial distribution 
with one draw and g categories with probabilities specified by tt. We now outline the E- and 
M-steps of the EM algorithm for htting the FM-CFUST model. 

3.1. The E-step 

The E-step of the EM algorithm requires the calculation of the so-called Q-function, 
which is the conditional expectation of the complete-data log likelihood given the observed 
data y, using the current estimate of T', which is denoted by after the kth iteration. 
It follows that on the {k + l)th iteration, the E-step requires the following five conditional 


Vh\ 

gamma — j , 

Multg(l,7r), (3) 
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expectations to be calculated, 


(fc) 

^hj 

(fc) 

(k) 

Hhj 

Ak) 

^2hj 

Jk) 

^3hj 


Eq,(k) [zhj = 1 I Vj] , 

E^(k) [whj I Vj, Zhj = l] , 

E^(k) [log{whj) I yj,Zhj = 1] , 

E\^{k) [ic/ijii/ijf I yj,Zhj — l] , 


E^(k) 


Whj'^hj'^hj 


yj,Zhj = 1 


(4) 

(5) 

( 6 ) 

(7) 

( 8 ) 


The expressions for (4) to (8) are given in Lee and McLachlan (2016a) and therefore not 
repeated here. However, it should be noted that can be evaluated using different ap¬ 
proaches, two of which are described in the above reference. For simplicity, the EMMIXcskew 
package implements the one-step-late (OSL) approach for this conditional expectation. It 
should be noted that the use of the approximate OSL approach to calculate can result in 
the incomplete-data likelihood not increasing monotonically. This conditional expectation can 
be calculated more accurately by a power series derived in Lee and McLachlan (2014b, 2016a) 
for which monotonicity of the likelihood is preserved. An implementation of this option will 
be provided in a future update of this package. 


3.2. The M-step 

On the (fe -|- l)th iteration of the the M-step, the current estimate of 'I', is updated to 

which is chosen to globally maximize over 'J'. For the FM-CFUST model, 

the M-step leads to the following updates: 


vr 


(fc+i) _ 




. (k+l) 


n ^ 


(k) 

hj ’ 


i=i 


(k) Jk)(k) 


V” Zk-w^’v'-A'-’T"' z^’e 




E n 

.7 = 1 


1=1" '^hj 


E4‘’(«.-Mr‘')4i>' 

1=1 


E 

1=1 


Ak)Jk) 

^hj ^3hj 


-1 


(9) 




E 

1=1 


,(fc) 

'hj 


(k) ( (fc-rl)\ ( {k+l)\'^ 

Kj [Vj -Hr ) [Vj -Hh ) 


{k+l)ik) 


.{k+l)\ 


T 


_ Jky js.{k+iy 


2hj \yj -Hh ) -\yj-Hh e^hj 


^3hj 


E 4 ? 

1=1 


-1 


( 10 ) 


An update of the degrees of freedom Uh is obtained by solving the following equation for 
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(k+i) 


0 



E 

i=i 


(fc) 


hj 


— w 


(k) 

hj 


where V’(') denotes the digamma function. 


3.3. Generating initial values for parameters 

As the log likelihood function may exhibit a complicated prohle with many local maxima 
and the EM algorithm is sensitive to its initial values, it is important to choose good start¬ 
ing values. In this section, we consider three strategies for generating valid initial values for 
the EM algorithm for the FM-CFUST model. For the remainder of this section, we sup¬ 
press the subscript h (denoting the index of a component in a mixture model) for notational 
convenience. 


Nested approach 

An intuitive approach is to start the EM algorithm with the solution given by one of the nested 
models of a CFUST distribution, for example, the results from fitting a normal or t-mixture 
model. This option is available in EMMIXcskew with the fmcfust. init function (see Section 
5.2), which accepts the outputs from the packages EMMIXskew and EMMIXuskew. The 
former package provide routines to fit mixtures of (multivariate) normal, t-, (restricted) skew 
normal, and skew t-distributions, whereas the latter package hts a mixture of unrestricted 
skew t-distributions. 


Method of moments-based approach 

Another approach is based on the moments of an unrestricted multivariate skew normal 
(uMSN) distribution. As noted earlier, the uMSN distribution is a nested case of the CFUST 
distribution. It can be characterized as the convolution of a truncated normal variable and a 
multivariate normal variable as follows, 

Yj = p. + X\U Qj\ + U ij ^ (11) 

where /x G M, A = diag(<5) is a diagonal matrix of skewness parameters with diagonal elements 
given by 8, Uoj Np{0, Ip) and Uij ~ Np{0, S). The uMSN distribution (11) has mean and 
variance given by 


E{Yj) = + 

and 

covfYj) = S + (1--)A2, (12) 

TT 

respectively. On rearranging the above expressions, we obtain an expression for p and X in 
terms of 8 (recall that by definition A = diag(5) for the uMSN distribution). To obtain an 
initial value for one can consider reducing the values of the diagonal elements of by 
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an arbitrary proportion (1 — a) where a G [0,1] (Lin 2010). This leads a set of expressions 
given by 


^(0) 




V vr - 2 


X;(o) 

= S'-1-(a - 1) diag(s*). 




(13) 


where the sign of each element of is given by the sign of the third-order sample moment 
of the corresponding variable about its sample mean. In (13) above, s* is a p-dimensional 
vector containing the diagonal elements of the sample covariance matrix S, and y denotes the 
sample mean. Concerning the degrees of freedom, it can be set (initially) to a large number 
to reflect a uMSN distribution. 

Transformation approach 

A third approach is based on a transformation of Yj in an attempt to better handle the 
correlation of the variables in Yj. We consider the transformation vector Xj = CYj, where 
C is an orthogonal matrix such that the covariance matrix of Xj, cov{Xj), is diagonal. 
In practice, we work with the sample covariance matrix of Yj. Then we can fit a uMST 
distribution to the transformed vector Xj, where each Xj can be characterized as 

W, = CY, 

= y, + X\Uoj\ + Uij, (14) 

where Uoj and Uij follow a central multivariate t-distribution with n degrees of freedom 
and scale matrix given by Iq and XI, respectively. Note that in (14) above, we have used a 
stochastic representation of the CFUST distribution that is analogous to (11) for a uMSN 
distribution. On pre-multiplying Xj by C~^ in (14) to obtain Yj, we have 

Yj = y + C^ X\UQj\ + C'^Uij. (15) 

This suggests that an initial value for y and for A in a CFUST distribution can be given by 
y and C'"'" A, respectively, where y and A are the estimates of y and A obtained by fitting 
the uMST distribution to the Xj. However, it should be noted that if the true distribution 
of Yj were a CFUSN distribution, then the transformed data Xj may not necessarily have 
a uMSN distribution even though cov(Xj) is diagonal. This happens when the off-diagonal 
elements of XI cancel out the off-diagonal elements of AA^. But it might be expected that 
in most situations where the sample covariance matrix of the Xj is approximately diagonal, 
the matrix XI and the skewness matrix A are both diagonal or close to it. 

In the case where a mixture of CFUST distributions is to be fitted rather than a single 
component distribution, we would need to first cluster the Yj into g clusters and proceed 
separately within each cluster as described above. 

The above three methods are implemented in the function f mcf ust. init of the EMMIXcskew 
package. By default, it adopts the moments method. An example on a real dataset demon¬ 
strating the use of these approaches is given in Section 5.2. 
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3.4. Stopping criterion 

We adopt the Aitken acceleration-based stopping criterion as the default strategy to assess the 
convergence of the EM algorithm for the FM-CFUST model. The EMMIXcskew package also 
provides a few other criteria as an alternative, including those based on the relative change 
in the log likelihood value and estimates of the parameters of the model. 


Aiken acceleration-based approach 

In brief, when using the default strategy, our algorithm terminates when the absolute differ¬ 
ence between a log likelihood value and its asymptotic estimate is smaller than a specified 
tolerance, e; that is, when 


fdk+l) _ £(k+l) 


< e, 


(16) 


where is the log likelihood value at the {k -f l)th iteration and is its asymptotic 

estimate, given by 


£{k+l) _ £(k) _|_ 


£(k+l) _ £(k) 
1 - 


(17) 


/ 7 \ pikj I 11 pi 

In the above, a^^’ = denotes the Aitken’s acceleration at the A:th iteration (Bohning 

et al. 1994, McLachlan and Krishnan 1997). The default tolerance of e = 10“® is applied to 
the examples in the following sections, but the user can specify a different value. 


Relative likelihood-based approach 

Another commonly used stopping criterion is to monitor the relative changes in the log like¬ 
lihood values at the end of each iteration and to stop the algorithm when the (relative) 
difference between two successive log likelihood values is less than a specified threshold. More 
formally, our algorithm terminates when 

< e, 

where the threshold e is set to 10“® by default. Again, the user can specify a different 
threshold. 



(18) 


Relative parameters-based approach 

Apart from tracking the changes in the log likelihood value, one can also monitor the changes 
in the parameter estimates. In this case, the algorithm is considered to have converged when 
the relative change in all the parameter estimates is less than a specified threshold e. Note 
that this criterion implies that the relative changes of all the free parameters needs to be 
smaller than e, that is, all elements of 4^ including the mixing proportions. Thus, the EM 
algorithm terminates when 

^(fc+i) _ 

^(fc+i) 


< e. 


(19) 
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Again, the default tolerance is e = 10 

3.5. Notes on the EM implementation 

In the EM algorithm described in this paper, the number of components g and the dimension 
q of the skewing vector must be specified. In practice, these are typically unknown and model 
selection criteria are employed to aid in choosing an appropriate value of g and q. Some of the 
more commonly used information criterion include the Bayesian information criterion (BIG; 
Schwarz (1978)), given by 


BIC = mlogn —2^, (20) 

and the Akaike information criterion (AIC; Akaike (1974)), given by 

BIC = 2m- 21, (21) 

where m is the number of free parameters, n is the number of observations, and i is the value 
of the log likelihood function at the fitted parameter vector. As is typical in fitting a mixture 
of factor analyzers (MFA) model, one may fit the FM-CFUST model for a range of values of 
p and q and choose the combination of p and q corresponding to the lowest AIC or BIC. An 
alternative strategy for automatically selecting an appropriate g was considered in Lee and 
McLachlan (2016c) which is based on the minimum message length (MML) criteria. However, 
concerning the value of q, it was observed in Lee and McLachlan (2016a) that when fitting 
the CFUST model with q = p it was able to model data generated from the rMST (g = 1) 
and uMST {q = p and A is a diagonal matrix) distributions quite well. In particular, the 
estimated A matches the true A reasonably well. For example, in the case of data generated 
from a rMST distribution, all but one of the columns of the estimated A has elements being 
relatively small, thus resembling the q = t case. Hence, in the implementation of the EM 
algorithm in the EMMIXcskew package, the default value of q is set to p. But the user is 
encouraged to experiment with different values of q when fitting the FM-CFUST model. 


4. Using the EMMIXcskew package 

The EMMIXcskew package implements the EM algorithm described in Section 3 and provides 
additional functions such as random sample generation, density evaluation, and graphics 
outputs. The software is primarily written in R. EMMIXcskew will be made available on the 
Comprehensive R Archive Network (CRAN). 

In this sequel, we now demonstrate the basic usage of the EMMIXcskew package using simple 
examples. In particular, the following main routines are discussed: 

• dfmcfust: evaluation of density values; 

• rfmcfust: generation of a random sample from a FM-CFUST distribution; 

• fmcfust: fitting a FM-CFUST model; 

• init.cfust: generation of initial values for use in fmcfust; 

• fmcfust. contour.2d: plotting a 2D graph of the contours of a FM-CFUST model; 
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Parameters 

R arguments 

Dimensions 

Description 


mu 

px\x g 

the location parameters 

S 

sigma 

p X p X g 

the scale matrices 

A 

delta 

pxqx g 

the skewness parameters 

V 

dof 

gxl 

the degrees of freedom 

TT 

pro 

5X1 

the mixing proportions 


Table 1: Structure of the model parameters in EMMIXcskew. 

• fmcfust. contour.3d: plotting 3D surface contours of a FM-CFUST model. 


4.1. The density function of FM-CFUST distribution 

The density of a FM-CFUST distribution is calculated by the dcfust function. The inputs 
to be passed into this function are similar to that in dfmmst from the EMMIXuskew package, 
and must be structured as described in Table 1. Briefly, the parameters fi, S, and S are each 
implemented as a list of g matrices, where g is the number of components in the fitted model. 
Each element of the list objects mu, sigma, and delta {h = must be specified 

as a p X 1, p X p, and p x q matrix, respectively. The parameters dof and pro are g hy 1 
arrays, representing the degrees of freedom and the mixing proportions for each component, 
respectively. Finally, the input data are specified by dat, an n x p matrix where each row 
represents an individual observation. 

Typically, one may be interested in calculating density values for a fitted FM-CFUST model 
(obtained from the fmcfust function). In this case, the output object of the fitted model can 
be directly passed in to dfmcfust as a single argument known. Note that if both known and 
all the model parameters were provided by he user, only the values specified by known would 
be used. Issuing the following command will return a vector of n x 1 density values. 

dfmcfust(dat, mu, sigma, delta, dof, pro, known=NULL) 

For a single CFUST distribution (that is, g=l), the dcfust function can be used. Here, the 
arguments mu, sigma, and delta need not be a list object, but mu can be a numeric array, 
and sigma and delta are matrices. Similar to the above function call, the dcfust can be 
called at the R command prompt as follows, which will return a numeric vector of density 
values. 

dcfust(dat, mu, sigma, delta, dof) 

4.2. Fitting a single CFUST distribution 

To fit the FM-CFUST model with a single component (g = 1), the main routine fmcfust can 
be used. This implements the EM algorithm described in Section 3, with the default strategy 
for initial values given in (13). By default, q is assumed to be the same as p. A typical call 
of fmcfust is; 

fmcfustCg, dat, q=p, initial=NULL, known=NULL, itmax=100, eps=le-6, 
nkmeans=20, verhose=TRUE) 
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As a simple example, we consider the iris dataset, available directly in R. For illustration 
purposes, we look at the Versicolor species and focus on the two variables Sepal.Widthh and 
Pedal. Length. We first create a new data object iris .versicolor with the required data, 
then execute the fmcfust function with g = 1- This is the minimum information that must 
be supplied to fmcfust. 

R> library(EMMIXcskew) 

R> data(iris) 

R> iris.versicolor <- iris[iris$Species=="versicolor",2:3] 

R> Fit.versicolor <- fmcfust(l, iris.versicolor) 

The above command will return a fmcfust object, containing the final estimate of the pa¬ 
rameters, the predicted cluster labels, and a number of measures associated with the fit¬ 
ted model. The final estimated parameters can be accessed using Fit. versicolor$mu. 

Fit. versicolor$sigma. Fit. versicolor$delta. Fit. versicolor$dof , and Fit. versicolor$pro 
To view these parameters, summary can be called: 

R > summary(Fit.versicolor) 

Finite Mixture of Multivariate CFUST Distribution 
with 1 component 

Mean: 

[,1] 

[1,] 3.415878 
[2,] 4.886890 

Scale matrix: 

[, 1 ] [, 2 ] 

[1,] 0.006138577 -0.007283746 

[2,] -0.007283746 0.020649780 

Skewness matrix: 

[, 1 ] [, 2 ] 

[1,] -0.4901844 -0.37242352 
[2,] -0.9067630 0.03643873 

Degrees of freedom: 

[1] 87.47343 


The other arguments of fmcfust are similar to that used in fmmst from the EMMIXuskew 
package. Briefly, known is a list of model parameters that are known a priori and, if sup¬ 
plied, will not be updated in the iterations of EM algorithm. The arguments itmax and eps 
determine when the EM algorithm is terminated. If either the maximum number of itera¬ 
tions as specified by itmax is reached or the tolerance as specified by eps is obtained, the 
EM loop will terminate. User specified initial values can be supplied using initial. Note 
that this must be a list object structured as in Table 1. The argument nkmeans speci¬ 
fies the number of fe-means trials to be preformed when using the default starting strategy. 
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versicolor 


ro 

0) 

Q_ 



Figure 1: Contours of a FM-CFUST model fitted to the versicolor data. 


With verbose=TRUE, fmcfust prints the log likelihood value at each iteration and displays a 
summary of the estimated parameters of the model. 

Note that in the above example we have used the default starting strategy to generate initial 
values, and assume q = p. As pointed out in Section 3, this may not always give the optimal 
fit. It is highly recommended that the EM algorithm is run from a range of different starting 
values. Some alternative methods for generating different starting values are discussed in 
Section 3.3. These are implemented in init.cfust (see Section 5.2 for further discussions). 
In addition, the user can also experiment with different values of q. However, it is interesting 
to note that for the simulated dataset of Section 4.1 in Lee and McLachlan (2016a), it was 
observed that the FM-CFUST model is able to (roughly) recover the structure of A without 
prior knowledge of any constraints on the matrix of skewness parameters. 

To assist in choosing a suitable model for the data from a range of different fitted results 
(for example, using different starting values), log likelihood values and information measures 
such as AIC and BIC can be compared. These values are available as part of the out¬ 
put of fmcfust and can be accessed using Fit. versicolor$loglik. Fit. versicolor$aic, 
and Fit.versicolor$bic. They can also be viewed using the print command as shown 
below for the example above. The contours of the fitted model can be visualized using 
fmcfust. contour. 2d (see Figure 1). Further details and examples of contour plots will be 
given in Section 5.5. 

> print(Fit.versicolor) 

Finite Mixture of Multivariate CFUST Distribution 
with 1 component 
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. .. the first five elements omitted ... 


$tau 

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] 

[ 1 ,] 111111111 1 1 

$clusters 

[ 1 ] 1111111111111111111 
... the rest omitted.. 

$loglik 

[1] -32.30904 

$aic 

[1] 84.61809 
$bic 

[1] 103.7383 


4.3. Fitting a FM-CFUST distribution 

Consider now the fitting of a three-component FM-CFUST model to the entire iris dataset. 

It consists of four geometric measurements on 150 observations of Iris, with 50 observations 
from each of three species of Iris (Setosa, Virginica, and Versicolor). The following code hts 
a FM-CFUST model using the results of a FM-uMST model as starting values. 

R> fit.unrestricted <- fmmst(3,iris [,-5]) 

R> fit.iris <- fmcfust(3, iris[,-5], initial=fit.unrestricted, method="EMMIXuskew") 
Again, this returns a fmcfust object. 

Finite Mixture of Multivariate CFUST Distributions 
with 3 components 


Iteration 

Iteration 

Iteration 

Iteration 

Iteration 

Iteration 


0 : loglik = 

1 : loglik = 

2 : loglik = 

3 : loglik = 

4 : loglik = 

5 : loglik = 
rest omitted .. 


-171.9228 

-170.8919 

-170.419 

-170.027 

-169.6717 

-169.3511 


Iteration 100: loglik = -154.561 
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Component means: 

[,1] [,2] [,3] 

[1,] 4.8679805 6.345333 5.888617 
[2,] 3.2808574 3.066393 2.808073 
[3,] 1.4284854 4.474453 5.037550 
[4,] 0.1343135 1.317850 2.072705 

Component scale matrices: 


[[!]] 

[,1] 

[,2] 

[,3] 

[,4] 

[IJ 

0.0443155106 

0.062560933 

-0.001555864 

-0.0009055587 

[2,] 

0.0625609328 

0.113034979 

-0.005108048 

-0.0020813146 

[3,] 

-0.0015558645 

-0.005108048 

0.004406555 

-0.0001740700 

[4J 

-0.0009055587 

-0.002081315 

-0.000174070 

0.0014520658 

[[2]] 

[,1] 

[,2] 

[,3] 

[,4] 


[1,] 0.09076830 0.025306942 0.025998408 0.019130182 
[2,] 0.02530694 0.017246048 0.006082013 0.013000612 
[3,] 0.02599841 0.006082013 0.018261259 0.007659824 
[4,] 0.01913018 0.013000612 0.007659824 0.013618464 

[[3]] 

[,1] [,2] [,3] [,4] 

[1,] 0.21080765 0.09796696 0.14128463 0.08171739 
[2,] 0.09796696 0.06083267 0.07166802 0.04816533 
[3,] 0.14128463 0.07166802 0.11360874 0.07217753 
[4,] 0.08171739 0.04816533 0.07217753 0.06425322 


Component skewness matrices: 

[[ 1 ]] 

[,1] [,2] [,3] [,4] 

[1,] 0.31881607 -0.27858936 0.007451723 0.12347998 

[2,] 0.07624952 -0.11956708 0.069916382 0.16058854 

[3,] -0.04518459 -0.15517472 0.171888937 0.07288276 

[4,] 0.01427354 -0.01726205 0.004160663 0.14294473 

[[ 2 ]] 

[,1] [,2] [,3] [,4] 

[1,] -0.26806780 0.08357397 -0.5825049 0.254300197 

[2,] 0.15339662 -0.23236022 -0.3423885 0.045900090 

[3,] 0.05249894 0.22566719 -0.6678188 0.115936808 

[4,] 0.12559134 0.08309988 -0.2042694 0.007060828 


[[ 3 ]] 
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[IJ 

[ 2 ,] 

[3,] 

[4J 


[, 1 ] 

0.136541936 

-0.017338343 

-0.219878985 

-0.006770662 


[, 2 ] 

-0.13366639 

0.15012982 

0.03899513 

0.10861986 


[,3] 

0.51628869 

-0.15683670 

0.53218928 

-0.04394092 


[,4] 

0.3692744 

0.2290092 

0.3051550 

-0.1217400 


Component degrees of freedom: 
51.74962 185.87390 126.98517 


Component mixing proportions: 
0.3333333 0.3314298 0.3352368 


4.4. Nested special cases of the FM-CFUST distribution 

In this section, we focus on the restricted and unrestricted versions of MST mixture models. 
For the normal and t-mixture models, routines for fitting them are implemented in EM- 
MlXskew. As noted earlier, the rMST distribution corresponds to a CFUST distribution 
with q = 1. Thus setting q=l in fmcfust will fit a FM-rMST model. However, as EM- 
MlXskew uses a specialized implementation of the EM algorithm for this model, the user is 
encouraged to use this package when fitting a FM-rMST model. Similarly, the EMMIXuskew 
package can be used for the fitting of a FM-uMST model. To fit the FM-rMST model to the 
same dataset as in the example above using the EMMIXcskew package, the following code 
can be used. 

> fit.restricted <- fmcfust(3, iris[,-5],q=l) 


The above model can also be fitted using the EMMIXskew package with the command 
fit. restricted <- EmSkew(iris [,-5] ,3, "mst" ,debug=FALSE). We can compare the pre¬ 
dicted cluster labels of these models against the true class labels. Eor all three models, the 
predicted cluster labels are stored as dust in the output object. A cross-tabulation of these 
labels suggests that the fitted EM-CEUST model performs well with only one misclassified 
observation, whereas the EM-rMST and EM-uMST models have three and six misclassified 
observations, respectively. 


R> table(iris$Species, fit.iris$clust) 

12 3 
setosa 0 50 0 
versicolor 49 0 1 
virginica 0 0 50 

R> table(iris$Species, fit.restricted$clust) 

12 3 
setosa 0 6 44 
versicolor 50 0 0 
virginica 50 0 0 
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Model 

FM-CFUST 

FM-rMST 

FM-uMST 

MCR 

0.0067 

0.3733 

0.0200 


Table 2: Misclassification rate of the three skew t-mixture models fitted to the iris dataset. 

R> table(iris$Species, fit.unrestricted$clust) 

12 3 

setosa 50 0 0 

versicolor 0 47 3 

virginica 0 0 50 

The misclassification rate (MCR) against the true labels can be calculated using error. rate 
from the EMMIXskew package. In this example, the FM-CFUST model obtained the lowest 
MCR compared to the FM-rMST and FM-uMST models (see Table 2 and the code below). 
Figure 2 shows the clustering of the iris dataset using these three models. 

R> error.rate(unclass(iris$Species), fit.iris$clust) 

[1] 0.006666667 

R> error.rate(unclass(iris$Species), fit.restricetd$clust) 

[1] 0.3733333 

R> error.rate(unclass(iris$Species), fit.unrestricted$clust) 

[1] 0.02 
R> 

R> panell <- function(x,y,...)-ipoints(x,y,col=c("red","green3","hlue") 

+ [fit.iris$clust],pch=20)} 

R> panel2 <- function(x,y,...){points(x,y,col=c("red","green3","blue") 

+ [fit.unrestricted$clust],pch=20)} 

R> panel3 <- function(x,y,...){points(x,y,col=c("red","green3","blue") 

+ [fit.restricted$clust],pch=20)} 

R> pairs(iris [1:4], main = "Iris Data", pch = 20, col = c("red","green3","blue") 
+ [unclass(iris$Species)], lower.panel=panell) 

R> pairs(iris [1:4], main = "Iris Data", upper.panel=panel2, lower.panel=panel3) 

In the case of univariate data, note that all three models become identical, and thus the use of 
EmSkew from the EMMIXskew package is recommended as it provides a more computationally 
efficient implementation. It should be noted that the fitting of the FM-uMST and FM-CFUST 
models can be time consuming due to the amount of calculations required, especially when 
q is large. When tested on a 3.4GHz machine, the example in Section 4.2 took around 3.5 
seconds to complete. For the examples in this Section, the CPU time for the FM-CFUST, 
FM-uMST, and FM-rMST models is around 1856, 1927, and 13 seconds, respectively. Note 
that if the specialised implementation of the EMMIXskew package is used, the CPU time for 
the FM-rMST model in this example reduces to 0.8 seconds. 


5. Miscellaneous functions 
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Iris Data Iris Data 

2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 2-0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 




Figure 2: Clustering of the iris dataset. The upper panels of the figure on the left show 
the true labels, whereas the lower panels are the predicted labels given by the FM-CFUST 
model. For the figure on the right, the upper panels correspond to the results given by the 
FM-uMST model and the lower panels correspond to that given by the FM-rMST model. 


This section presents some illustrative examples on how to generate random observations, 
generate/provide initial values (for the EM algorithm), use different stopping criteria, and 
draw contour plots for a FM-CFUST model using EMMIXcskew. 

5.1. Random sample from the FM-CFUST distribution 

The CFUST admits a convolution-type stochastic representation that facilitates random sam¬ 
ple generation. More specifically, let Uq and Ui be independent random variables following 
multivariate normal distributions given by A^p(0, S) and Ng{0,lg), respectively. Let also w 
be a scalar random variable with the gamma(|, |) distribution. Then 

Y = ti+^A\Uo\ + ^Uo (22) 

yw ^Jw 

has a CFUST distribution with density given by (1). The rcfust function adopts (22) to 
generate a random sample of CFUST observations. Its mixture version is implemented as 
rfmcfust in EMMIXcskew. These two functions are given, respectively, by 

rcfust(n, mu, sigma, delta, dof, known=NULL, ...) 
rfmcfust(g, n, mu, sigma, delta, dof, pro, known=NULL, ...) 

Input arguments for the above functions follow the same structure as described in Section 

4.1, permitting the parameters of the CFUST (or FM-CFUST) model to be specified either 
individually using mu, sigma, delta, and dof (and also pro for a FM-CFUST model) or within 
a list object using known. The argument n specifies the number of random observations to 
be generated. In the case of a FM-CFUST model, n is either a single integer (which represents 
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the total number of observations to be generated) or a vector of g integers representing the 
number of observations to be generated from each of the g component. Note that if n is a 
single value, rfmcfust will determine the sample size for each component using the mixing 
proportion specified by pro. 


As an example, suppose one would like to generate 10 random observations from a CFUST 


distribution with /r = (1,2) , S = I 2 , A = 


2 1 
1 2 


, and z/ = 4, the following command can 


be issued at the R command prompt. A 10 x 2 matrix will be returned. 


R> rfust(10,c(l,2),diag(2),matrix(c(2,1,1,2),2,2),4) 

[,1] [,2] 

[1,] 5.836001 5.600793 

[2,] 3.080172 4.213700 

[3,] 3.305617 4.888012 

[4,] 4.390739 3.109635 

[5,] 4.003996 4.686407 

[6,] 1.609795 1.599386 

[7,] 3.361534 5.326190 

[8,] 3.449745 4.474217 

[9,] 10.886028 7.964134 
[10,] 5.752894 7.049037 

Generating observations from a mixture of CFUST distributions is also quite simple using 
rfmcfust. We first create an object with the required parameters. This can then be directly 
passed in to rfmcfust. An example is shown below. A n x (p + 1) matrix is returned where 
the last column gives component label for each data point generated. 

R> obj <- listO 

R> ohj$mu <- list(c(17,19), c(5,22), c(6,10)) 

R> ohj$sigma <- list(diag(2), matrix(c(2,0,0,l),2), 

+ matrix(c(3,7,7,24) ,2)) 

R> obj$delta <- list(matrix(c(3,0,2,l.5),2,2), matrix(c(5,0,0,10),2,2), 

+ matrix(c(2,0,5,0),2,2)) 

R> obj$dof <- c(l, 2, 3) 

R> objSpro <- do.2b, 0.25, 0.5) 

R> rfmcfust(3, 100, known=obj) 

[,1] [,2] [,3] 

[1,] 46.143907 25.56151304 1 

[2,] 17.816665 18.22572581 1 

[3,] 33.915805 25.54308697 1 

[4,] 44.609637 15.81099978 1 

[5,] 54.766995 16.10253015 1 

[6,] 18.610320 19.74165026 1 

[7,] 25.303312 20.80981782 1 

[8,] 20.608770 21.58460735 1 

[9,] 19.679756 20.51390429 1 
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[10,] 20.988970 16.10998335 1 

... the rest omitted ... 


5.2. Starting values for fitting FM-CFUST distributions 

Three different strategies for generating starting values for the FM-CFUST model were de¬ 
scribed in Section 3.3. These are implemented in the EMMIXcskew package. Apart from 
the default starting strategy (13) which makes use of moment-based estimates of a uMSN 
distribution, the init.cfust function implements the transformation approach (15) as one 
of its options, and accepts starting values based on the results of its nested models as another 
option. The arguments of the function are the following: 

init.cfust(g, dat, q=p, initial=NULL, known=NULL, clust=NULL, nkmeans=20, 
method=c("moments","transformation","EMMIXskew","EMMIXuskew")) 

To use a fitted model given by the packages EMMIXskew and EMMIXuskew, set method to 
"EMMIXskew" and "EMMIXuskew", respectively, and the output of the functions EmSkew and 
fmmst can be directly passed into init.fust using the argument initial. If an initial value 
of the parameter vector is not supplied (that is, initiaI=MULL), then the default option is to 
provide an initial value for each component-parameter vector obtained by applying the method 
of moments (that is, method="moments") to the clusters corresponding to the components. 
These clusters are obtained by using the fe-means procedure, but the user can specify an 
initial partition obtained using some other method of clustering, for example, a model-based 
approach using a mixture of t-distributions. The user-specified initial partition is passed in 
using dust. If the transformation approach is preferred, set method="transformation". 
Again, in this case, if an initial partition is not supplied, the partition given by /c-means 
clustering is used. 

An example session is shown below demonstrating how to use the above function to generate 
different starting values. We use the Geyser dataset (Azzalini and Bowman 1990), which 
contain measurements on 299 successive eruptions of the Old Faithful Geyser during 1 August 
to 15 August 1985. The two variables recorded were the waiting time between eruptions and 
the duration of each eruption, both measured in minutes. This dataset is available from the 
MASS package. 

R> library(MASS) 

R> data(geyser) 

R> plot(geyser, pch=20) 

An initial inspection of the dataset (Figure 3(a)) suggests three clusters. Hence, we set 
g to 3. In the example below, initial. default and initial. transformation refers to 
default (moment-based) approach and the transformation approach, respectively. For the 
nested approach, we have demonstrated in Section 4.3 how to use the results of a fitted FM- 
uMST model as initial values. In that example, the model was fitted using fmmst () in our 
package, which is a replica of the same function in the EMMIXuskew package, and hence the 
option method="EMMIXuskew" was used. This option can be used in the same way to supply 
initial values from the EMMIXuskew package. In addition, the EMMIXcskew package also 
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accepts outputs from the EMMIXskew package which provide routines to fit finite mixtures 
of normal, t, (restricted) skew-normal, and (restricted) skew t-distributions. In this case, the 
option method="EMMIXskew" is used to pass the results to fmcfustO. 

R> initial.default <- init.cfust(3, geyser) 

R> initial.transformation <- init.cfust (3, geyser, method="transformation") 

R> fit.geyser.restricted <- EmSkew(geyser, 3, "mst", dehug=FALSE) 

R> initial.restricted <- init.cfust(3, geyser, initial=fit.gesyser.restricted, 

+ method="EMMIXskew") 

R> fit.geyser.unrestricted <- fmmst(3, geyser) 

R> initial.unrestricted <- init.cfust(3, geyser, initial=fit.gesyser.unrestricted, 
+ method="EMMIXuskew") 

R> fit.geyser.t <- EmSkew(geyser, 3, "mvt", dehug=FALSE) 

R> initial.t <- init.cfust (3, geyser, initial=fit.gesyser.t, method="EMMIXskew") 

To help choose an appropriate starting value, we can compare the log likelihood values for 
the FM-CFUST model fitted using these initial values. 

R> initial.default$loglik 
[1] -1903.598 

R> initial.transformation$loglik 
[1] -1448.33 

R> initial.restricted$loglik 
[1] -1335.038 

R> initial.unrestricted$loglik 
[1] -1404.772 
R> initial.t$loglik 
[1] -1347.385 

In this case, the starting value that corresponds to the fitted model of FM-rMST gave the 
highest log likelihood value. We now proceed to fit a FM-CFUST model using the default 
starting strategy, the transformation approach, and the fitted model of FM-rMST. 

R> fit.geyserl <- fmcfust(3, geyser, initial=initial.default) 

R> fit.geyser2 <- fmcfust(3, geyser, initial=initial.transformation) 

R> fit.geyser3 <- fmcfust (3, geyser, initial=initial.restricted) 

R> fit.geyserl$loglik 

[1] -1415.442 

R> fit.geyser2$loglik 

[1] -1344.724 

R> fit.geyser3$loglik 

[1] -1333.533 

According to the use of the final log likelihood value, the results of f it .geyser3 are preferred. 

This corresponds to the result using the initial value with the highest log likelihood value 
identified above. Note that this may not always be the case; that is, using the initial value 
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with the highest log likelihood value may not always give the optimal results compared to 
those with smaller initial log likelihood values. It is advisable to run the EM algorithm using 
a range of different starting values. To visualize the clustering results of the above three 
models, we can plot the data with colours according to the predicted cluster labels given by 
these models, as shown below. Figures 3(a), 3(b), and 3(c) show the results using the default 
strategy, the transformation approach, and the fitted FM-rMST model, respectively. It can 
be observed that fit.geyser2 perhaps gave a more natural partition of the data, although 
its log likelihood value is lower than that given by fit .geyserS. 

R> plot(geyser, pch=20, col=c("red","blue","green")[fit.geyserl$clust]) 

R> plotCgeyser, pch=20, col=c("red","blue","green")[fit.geyser2$clust]) 

R> plotCgeyser, pch=20, col=c("red","blue","green")[fit.geyser3$clust]) 


5.3. Stopping Criteria 

The stopping criteria described in Section 3.4 are available through the convergence option in 
fmcfust 0 . The default is using Aitken’s acceleration-based approach (convergence="Aitken"). 
The other two options are convergence="likelihood" and convergence= "parameters", re¬ 
ferring to relative likelihood based and relative parameters-based approach respectively. For 
illustration, using the Geyser dataset and intiial. restricted as initial values as an exam¬ 
ple, we can run the EM algorithm with the relative likelihood-based and relative parameter- 
based convergence criteria using the following commands. 

R> fit.geyser4 <- fmcfust(3, geyser, initial = initial.restricted, 

+ convergence=c("likelihood")) 

Finite Mixture of Multivariate CFUST Distributions 
with 3 components 


Iteration 

0 

loglik = -1335.038 

Iteration 

1 

loglik = -1335.01 

Iteration 

2 

loglik = -1334.987 

Iteration 

3 

loglik = -1334.965 

Iteration 

4 

loglik = -1334.943 

Iteration 

5 

loglik = -1334.922 

.. rest omitted ... 


Iteration 100: loglik = -1333.533 


R. fit.geyserS <- fmcfust(3, geyser, initial = initial.restricted, 
+ convergence=c("parameters")) 


Finite Mixture of Multivariate CFUST Distributions 
with 3 components 
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waltng 


waiting 


(C) 


(d) 


Figure 3: The Old Faithful geyser data from the MASS package, (a) Scatter plot of the 
data, (b) Clustering results of the FM-CFUST model using the default (moments-based) 
approach for generating starting values, (c) Clustering results of the FM-CFUST model 
using the transformation approach for generating starting values, (d) Clustering results of 
the FM-CFUST model using the results of FM-rMST model as starting values. 
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Iteration 0 
Iteration 1 
Iteration 2 
Iteration 3 
Iteration 4 
Iteration 5 


: loglik = -1335.038 

: loglik = -1335.01 

: loglik = -1334.987 

: loglik = -1334.965 

: loglik = -1334.943 

: loglik = -1334.922 

. .. rest omitted ... 


Iteration 100: loglik = -1333.533 

In both cases, we can observe from the output above that the EM algorithm terminates in 
the same iteration for this example. 

5.4. Choosing the number of components g with BIC 

Model selection criteria are typically used to guide the choice of g when fitting finite mixture 
models. The fmcfustO function provides the values of AIC and BIC as part of the output 
when fitting a FM-CFUST model. We show here a short example of using BIC to assist 
in choosing the optimal value of g. We fit the FM-CFUST model with g = 1,... ,4 to the 
Geyser data, using the default starting strategy. In this case, the lowest BIC corresponds to 
the model with g = 2. 

R> fit.geyser.gl <- fmcfust(l, geyser) 

R> fit.geyser.g2 <- fincfust(2, geyser) 

R> fit.geyser.g3 <- fit.geyser1 
R> fit.geyser.g4 <- fmcfust(4, geyser) 

R> fit.geyser.gl$bic 
[1] 3166.698 
R> fit.geyser.g2$bic 
[1] 2963.833 
R> fit.geyser.g3$bic 
[1] 3013.298 
R> fit.geyser.g4$bic 
[1] 3069.547 

5.5. Visualization of fitted contours 

Contour plots for a FM-CFUST model can be produced easily using the functions 
fmcfust. contour. 2d and fmcfust. contour. 3d for a two-dimensional and three-dimensional 
space, respectively. These two functions take a number of arguments described below. 

fmcfust. contour.2d(dat, model, grid=50, drawpoints=TRUE, clust=NULL, 
nlevels=10, component=NULL, ...) 

fmcfust. contour.3d(dat, model, grid=20, drawpoints=TRUE, clust=NULL, 
levels=0.9, component=NULL, ...) 
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Briefly, dat is a dataset that is either a matrix or data.frame. Note that if dat is not 
specified, then the limits of the axes of the plot must be specified (using the standard xlim, 
ylim, and zlim arguments). The parameters of the FM-CFUST model are specified using 
model. Typically, this is an output from fmcfust. The argument grid determines the grid 
size of the plots. Thus the higher the number in grid, the smoother the contours (at the cost 
of longer computation time). The data points (if provided) are plotted by default. By setting 
drawpoints=FALSE, only the contours will be plotted. In the case where <7 > 1, a user-specified 
partition of the data can be provided using dust. This is used when drawpoints=TRUE and 
the data points in the plot will be colour-coded using the labels in dust. The arguments 
nlevels and levels control how many contours are displayed and at which percentiles are 
they computed, respectively. Finally, component specifies which component is included in 
the plot. This option allows the components to be plotted individually without taking into 
account the mixing proportions. In contrast, the default is to return the contours of a mixture 
density. 

To illustrate the use of fmcfust. contour. 2d, we reconsider the iris .versicolor example 
in Section 4.2. Figure 1 can be generated by the following command in R. 

R> fmcfust.contour.2d(iris.versicolor, fit.Versicolor, drawpoints=FALSE, 

+ main="versicolor") 

We now turn to an example of generating a 3D contour plot of a FM-CFUST model. Sup¬ 
pose we would like to draw the contours of a two-component FM-CFUST distribution with 
parameters fii = (0,0,0)"'", /X 2 = (5,5,5)"'', ui = 1/2 = 3, tt = (0.2, 0.8), 
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We first create an object obj with these parameters. By default, a mixture density is produced 
on running the fmcfust. contour. 3d (see Figure 4(a)). A first remark on this figure is that 
the two components seem to be ‘joined’ together. To gain a better view of the shapes of these 
two components, we may set components=l: 2 so that their mixing proportions are ignored. 

In addition, we generate 500 random observations from the specified FM-CFUST model and 
include them in the plot. Observe now in Figure 4(b) that the two components are plotted 
as two separately objects and their colours are matched with the simulated data. 

R> obj <- listO 

R> ohj$mu <- list(matrix(c(0,0,0),3), matrix(c(5,5,5),3)) 

R> ohj$sigma <- list(matrix(c(5,2,1,2,5,1,1,1,1),3,3), 2*diag(3)) 

R> obj$delta <- list(matrix(c(1,0,0,1,0,0,1,0,0),3,3), 

+ matrix(c(5,0,0,0,10,0,0,0,15),3,3)) 

R> obj$dof <- c(3,3) 

R> ohj$pro <- c (0.2, 0.8) 

R> fmcfust.contour.3d(model=ohj, level=0.98, drawpoints=TRUE, xlab="X", 

+ ylab="Y", zlab="Z", xlim=c(-20, 50), ylim=c(-20,50), zlim=c(-20,80)) 

R> X <- rfmcfust(2, 500, known=obJ) 

R> fmcfust.contour.3d(X, model=obj, level=c(0.99, 0.92), drawpoints=TRUE, 

+ clust=X[,4], xlab="X", ylab="Y'', zlab="Z", xlim=c(-20, 50), ylim=c(-20, 50), 
+ zlim=c(-20, 80), component=l:2) 
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Figure 4: Contour plots of FM-CFUST models generated by EMMIXcskew package, (a) 3D 
contours of the density of a FM-CFUST distribution, (b) 3D contours of the density of each 
component of the FM-CFUST distribution shown in (b). 


6. Concluding remarks 

This paper has presented the EMMIXcskew package for the fitting of a CFUST distribution 
and hnite mixtures of CFUST distributions to heterogeneous data that exhibit non-normal 
features. In addition to computing the maximum likelihood estimates of the model param¬ 
eters, the EMMIXcskew package provides routines for random number generation, density 
evaluation, the plotting of 2D and 3D contours, and a few different methods for initial values 
generation for the FM-CFUST model. A finite mixture of CFUST distributions provides a 
model for the robust extension of traditional normal mixture models, with greater flexibility 
in handling asymmetry and heavy tails. The skewness parameters of a CFUST distribution 
are characterized by a general matrix, which provides an elegant unification of the restricted 
and unrestricted skew t-distributions. The aim of this package is to provide users with the 
option of fitting this flexible distribution to their dataset. Model selection criteria such as the 
AIC and BIC are provided for the FM-CFUST model to assist the user in choosing between 
different models for their data. 

It is noted that the htting of a FM-CFUST model can be quite computationally intensive when 
q is large. This is due to the calculations of some of the conditional expectations involved in 
the E-step of the EM algorithm. Euture work will consider applicable strategies to speed up 
the parameter estimation process for this model such as parallel implementations described 
in Lee et al. (2016b,a) and Lee and McLachlan (2016b). 


Acknowledgments 


This work is supported by grants from the Australian Research Council. 













26 


EMMIXcskew: Finite Mixtures of Canonical Fundamental Skew t-distributions 


References 


Abanto-Valle CA, Lachos VH, Dey DK (2015). “Bayesian Estimation of a Skew-Student- 
t Stochastic Volatility Model.” Methodology and Computing in Applied Probability, 17, 
721-738. 

Akaike H (1974). “A New Look at the Statistical Model Identification.” Automatic Control, 
19, 716-723. 

Arellano-Valle RB, Azzalini A (2006). “On the Unification of Families of Skew-Normal Dis¬ 
tributions.” Scandinavian Journal of Statistics, 33, 561-574. 

Arellano-Valle RB, Branco MD, Genton MG (2006). “A Unified View on Skewed Distributions 
Arising from Selections.” The Canadian Journal of Statistics, 34, 581-601. 

Arellano-Valle RB, Genton MG (2005). “On Fundamental Skew Distributions.” Journal of 
Multivariate Analysis, 96, 93-116. 

Asparouhov T, Muthen B (2016). “Structural Equation Models And Mixture Models With 
Gontinuous Nonnormal Skewed Distributions.” Structural Equation Modeling, 23, 1-19. 

Azzalini A (2005). “The Skew-normal Distribution and Related Multivariate Families.” Scan¬ 
dinavian Journal of Statistics, 32, 159-188. 

Azzalini A (2014). The R sn package : The Skew-Normal and Skew-t Distributions (version 
1.0-0). Universita di Padova, Italia. URL http://azzalini.stat.unipd.it/SN. 

Azzalini A, Bowman AW (1990). “A Look at Some Data on the Old Faithful Geyser.” Applied 
Statistics, 39, 357-365. 

Azzalini A, Browne RP, Genton MG, McNicholas P (2016). “On nomenclature for, and the 
relative merits of, two formulations of skew distributions.” Statistics and Probaility Letters, 
110, 201-206. 

Azzalini A, Gapitanio A (2003). “Distributions Generated by Perturbation of Symmetry with 
Emphasis on a Multivariate Skew t Distribution.” Journal of the Royal Statistical Society 
B, 65, 367-389. 

Azzalini A, Gapitanio A (2014). The Skew-Normal and Related Families. Institute of Math¬ 
ematical Statistics Monographs. Cambridge University Press, UK. 

Banfield JD, Raftery AE (1993). “Model-Based Gaussian and Non-Gaussian Clustering.” 
Biometrics, 49, 803-821. 

Benaglia T, Chauveau D, Hunter DR, Young DS (2009). “mixtools: An R Package for Ana¬ 
lyzing Mixture Models.” Journal of Statistical Software, 32(6), 1-29. 

Bernard! M (2013). “Risk Measures for Skew Normal Mixtures.” Statistics & Probability 
Letters, 83, 1819-1824. 

Biecek P, Szczurek E, Vingron M, Tiuryn J (2012). “The R Package bgmm: Mixture Modeling 
with Uncertain Knowledge.” Journal of Statistical Software, 47(3), 1-31. 


Sharon Lee <fe Geoffrey McLachlan 


27 


Bohning D (2000). Computer Assisted Analysis of Mixtures and Applications: Meta-Analysis, 
Disease Mapping, and Others. Chapman and Hall/CRC, London. 

Bohning D, Dietz E, Schaub R, Schlattmann, P Lindsay BG (1994). “The Distribution of the 
Likelihood Ratio for Mixtures of Densities from the One-parameter Exponential Family.” 
Annals of the Institute of Statistical Mathematics, 46, 373-388. 

Branco MD, Dey DK (2001). “A General Class of Multivariate Skew-Elliptical Distributions.” 
Journal of Multivariate Analysis, 79, 99-113. 

Contreras-Reyes JE, Arellano-Valle RB (2013). “Growth Estimates of Cardinalfish (Epigonus 
Crassicaudus) Based on Scale Mixtures of Skew-Normal Distributions.” Fisheries Research, 
147, 137-144. 

Dempster AP, Laird NM, Rubin DB (1977). “Maximum Likelihood From Incomplete Data 
via the EM Algorithm.” Journal of Royal Statistical Society B, 39, 1-38. 

Everitt BS, Hand DJ (1981). Finite Mixture Distributions. Ghapman & Hall, London. 

Fraley G, Raftery AE (1998). “How Many Glusters? Which Clustering Methods? Answers 
via Model-Based Cluster Analysis.” Computer Journal, 41, 578-588. 

Fraley C, Raftery AE (2007). “Model-Based Methods of Classification: Using the mclust 
Software in Chemometrics.” Journal of Statistical Software, 18(6), 1-13. 

Friihwirth-Schnatter S (2006). Finite Mixture and Markov Switching Models. Springer-Verlag, 
London. 

Friihwirth-Schnatter S, Pyne S (2010). “Bayesian Inference for Finite Mixtures of Univariate 
and Multivariate Skew-Normal and Skew-t Distributions.” Biostatistics, 11, 317-336. 

Genton MG (ed.) (2004). Skew-Elliptical Distributions and Their Applications: A Journey 
Beyond Normality. Chapman & Hall, CRC. 

Griin B, Leisch F (2008). “FlexMix Version 2: Finite Mixtures with Concomitant Variables 
and Varying and Constant Parameters.” Journal of Statistical Software, 28(4), 1-35. 

Gupta AK (2003). “Multivariate Skew-t Distribution.” Statistics, 37, 359-363. 

Ho HJ, Lin TI, Chang HH, Haase HB, Huang S, Pyne S (2012). “Parametric Modeling of 
Cellular State Transitions as Measured With Flow Cytometry Different Tissues.” BMC 
Bioinformatics, 13, (Suppl 5): S5. 

Hu X, Kim H, Brennan PJ, Han B, Baecher-Allan CM, De Jager PL, Brenner MB, Ray- 
chaudhuri S (2013). “Application of User-Guided Automated Gytometric Data Analysis to 
Large-Scale Immunoprofiling of Invariant natural killer T cells.” Proceedings of the National 
Academy of Sciences USA, 110, 19030-19035. doi: 10.1073/pnas. 1318322110. 

Lachos VH, Ghosh P, Arellano-Valle RB (2010). “Likelihood Based Inference for Skew Normal 
Independent Linear Mixed Models.” Statistica Sinica, 20, 303-322. 

Lee S, McLachlan GJ (2014a). “Finite Mixtures of Multivariate Skew t-Distributions: Some 
Recent and New Results.” Statistics and Computing, 24, 181-202. 


28 


EMMIXcskew: Finite Mixtures of Canonical Fundamental Skew t-distributions 


Lee SX, Leemaqz KL, McLachlan GJ (2016a). “A block EM algorithm for multivariate skew 
normal and skew t-mixture models.” arXiv:1608.02797. 

Lee SX, Leemaqz KL, McLachlan GJ (2016b). “A simple parallel EM algorithm for statistical 
learning via mixture models.” In Proceedings of the 2016 International Conference on 
Digital Image Computing: Techniques and Applications, pp. 295-302. 

Lee SX, McLachlan GJ (2013a). “Model-Based Clustering and Classification with Non-Normal 
Mixture Distributions.” Statistical Methods and Applications, 22, 427-454. 

Lee SX, McLachlan GJ (2013b). “Modelling Asset Return Using Multivariate Asym- met¬ 
ric Mixture Nodels with Applications to Wstimation of Value-at-Risk.” In J Piantadosi, 
RS Anderssen, J Boland (eds.), MODSIM 2013 (20th International Congress on Modelling 
and Simulation), pp. 1228-1234. Adelaide, Australia. 

Lee SX, McLachlan GJ (2013c). “On Mixtures of Skew-Normal and Skew t-Distributions.” 
Advances in Data Analysis and Classification, 7, 241-266. 

Lee SX, McLachlan GJ (2013d). “EMMIXuskew: An R Package for Fitting Mixtures of 
Multivariate Skew t Distributions via the EM Algorithm.” Journal of Statistical Software, 
55(12), 1-22. URL http://www.jstatsoft.org/v55/il2/. 

Lee SX, McLachlan GJ (2014b). “Maximum Likelihood Estimation for Finite Mixtures of 
Canonical Fundamental Skew t-Distributions: the Unification of the Unrestricted and Re¬ 
stricted Skew t-Mixture Models.” arXiv:1401.8182. 

Lee SX, McLachlan GJ (2016a). “Finite Mixtures of Canonical Fundamental Skew t- 
Distributions; The Unification of the Restricted and Unrestricted Skew t-Mixture Models.” 
Statistics and Computing, 26, 573-589. 

Lee SX, McLachlan GJ (2016b). “On mixture modelling with multivariate skew distributions.” 
In A Colubi, A Blanco, C Gatu (eds.). Proceedings of COMPSTAT 2016, pp. 137-147. 
The Hague: The International Statistical Institute/International Association for Statistical 
Computing. 

Lee SX, McLachlan GJ (2016c). “Unsupervised component-wise EM learning for finite mix¬ 
tures of skew t-distributions.” In Lecture Notes in Artificial Intelligence, volume 10086, pp. 
692-699. Berlin: Springer. 

Lee SX, McLachlan GJ, Pyne S (2014). “Supervised Classification of Flow Cytometric Samples 
via the Joint Clustering and Matching (JCM) Procedure.” arXiv:1411.2820. 

Lee SX, McLachlan GJ, Pyne S (2016c). “Modelling of Inter-sample Variation in Flow Cy¬ 
tometric Data with the Joint Clustering and Matching (JCM) Procedure.” Cytometry A. 
doi:10.1002/cyto.a.22789. 

Leisch F (2004). “FlexMix: A General Framework for Finite Mixture Models and Latent 
Class Regression in R.” Journal of Statistical Software, 11(8), 1-18. 

Lin TI (2010). “Robust Mixture Modeling using Multivariate Skew-t Distribution.” Statistics 
and Computing, 20, 343-356. 


Sharon Lee <fe Geoffrey McLachlan 


29 


Lin TI, McLachlan GJ, Lee SX (2016). “Extending Mixtures of Factor Models Using the 
Restricted Multivariate Skew-Normal Distribution.” Journal of Multivariate Analysis, 143, 
398-413. 

Lin TI, Wu PH, McLachlan GJ, Lee SX (2015). “A Robust Factor Analysis Model Using the 
Restricted Skew t-Distribution.” TEST, 24, 510-531. 

Lindsay BG (1995). Mixture Models: Theory, Geometry, and Applications. NSF-CBMS 
Regional Conference Series in probability and Statistics, Vol. 5 (Institute of Mathematical 
Statistics and the American Statistical Association), Alexandria, VA. 

McLachlan GJ, Basford KE (1988). Mixture Models: Inference and Applications. Marcel 
Dekker, New York. 

McLachlan GJ, Krishnan T (1997). The EM Algorithm and Extensions. John Wiley & Sons, 
Hokoben, New York. 

McLachlan GJ, Lee SX (2016). “Comment on ”On nomenclature for, and the relative merits 
of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. 
McNicholas.” Statistics and Probaility Letters, 116, 1-5. 

McLachlan GJ, Peel D (2000). Finite Mixture Models. Second edition. John Wiley & Sons, 
New York. 

Mengersen KL, Robert CP, Titterington DM (2011). Mixtures: Estimation and Applications. 
John Wiley & Sons, New York. 

Muthen B, Asparouhov T (2014). “Growth Mixture Modeling with Non-Normal Distribu¬ 
tions.” Statistics and Medicine, 34, 1041-1058. 

Prates MO, Cabral CRB, Lachos VH (2013). “mixsmsn: Fitting Finite Mixture of Scale 
Mixture of Skew-Normal Distributions.” Journal of Statistical Software, 54(12), 1-20. 

Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, 
Tamayo P, Hafler DA, De Jager PL, Mesirow JP (2009). “Automated High-Dimensional 
Flow Cytometric Data Analysis.” Proceedings of the National Academy of Sciences USA, 
106, 8519-8524. 

Pyne S, Lee S, McLachlan G (2015). “Nature and Man: The Goal of Bio-security in the 
Course of Rapid and Inevitable Human Development.” Journal of the Indian Society of 
Agricultural Statistics, 69, 117-125. 

Pyne S, Lee SX, Wang K, Irish J, Tamayo P, Nazaire MD, Duong T, Ng SK, Hafler D, 
Levy R, Nolan GP, Mesirov J, McLachlan G (2014). “Joint Modeling and Registration of 
Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data.” PLOS ONE, 9, 
el00334. doi:10.1371/journal.pone.0100334. 

Riggi S, Ingrassia S (2013). “A Model-Based Clustering Approach for Mass Composition 
Analysis of High Energy Cosmic Rays.” Astroparticle Physics, 48, 86-96. 

Rossin E, Lin TI, Ho HJ, Mentzer SJ, Pyne S (2011). “A Framework for Analytical Char¬ 
acterization of Monoclonal Antibodies Based on Reactivity Profiles in Different Tissues.” 
Bioinformatics, 27, 2746-2753. 


30 


EMMIXcskew: Finite Mixtures of Canonical Fundamental Skew t-distributions 


Sahu SK, Dey DK, Branco MD (2003). “A New Class of Multivariate Skew Distributions 
with Applications to Bayesian Regression Models.” The Canadian Journal of Statistics, 
31,129-150. 

Schaarschmidt F, Hofmann M, Jaki T, Griin B, Hothorn LA (2015). “Statistical Approaches 
for the Determination of Cut Points in Anti-Drug Antibody Bioassays.” Journal of Im¬ 
munological Methods, 25, 295-306. 

Schwarz G (1978). “Estimating the Dimension of a Model.” Annals of Statistics, 6, 461-464. 

Scrucca L, Fop M, Murphy TB, Raftery AE (2016). “mclust 5: Clustering, Classification and 
Density Estimation Using Gaussian Finite Mixture Models.” The R Journal, 8(1). 

Soltyk S, Gupta R (2011). “Application of the Multivariate Skew Normal Mixture Model 
with the EM Algorithm to Value-at-Risk.” In F Chan, D Marinova, RS Anderssen (eds.), 
MODSIM 2011 (19th International Congress on Modelling and Simulation), pp. 1638-1644. 
Perth, Australia. 

Titterington DM, Smith AFM, Markov UE (1985). Statistical Analysis of Finite Mixture 
Distributions. John Wiley &: Sons, New York. 

Tortora C, Browne RP, Eranczak BC, McNicholas PD (2015). MixGHD: Model Based 
Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hy¬ 
perbolic Distributions. R package version 1.7, URL http://cran.r-project.org/web/ 
packages/MixGHD. 

Wang K, McLachlan GJ, Ng SK, Peel D (2009). EMMIXskew: EM Algorithm for Mixture 
of Multivariate Skew Normal/t Distributions. R package version 1.0.20, URL http://GRAN. 
R-project.org/package=EMMIXskew. 


AfRliation: 

Sharon X. Lee 

School of Mathematics and Physics 
University of Queensland 
Brisbane, Australia 
E-mail: s.leellOuq.edu.au 

Geoffrey J. McLachlan 

School of Mathematics and Physics 

University of Queensland 

Brisbane, Australia 

E-mail: g.mclachlan@uq.edu.au 


